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Life Sciences Business Systems and Methods 

CROSS REFERENCE TO RELATED APPLICATION 

5 [00011 This application is a continuation-in-part of U.S. Patent 
Application Serial No. 10/107,508, filed March 26, 2002, entitled 
"Pharmaceutical and Diagnostic Business Systems and Methods" are is 
assigned to the assignee of this application and incorporated herein by 
reference. 

10 BACKGROUND 

[0002] Biotechnology/biopharmaceutical companies have found 
significant commercial success in business methods wherein a biotechnology 
company partners with a large pharmaceutical company in pursuit of a 
particular scientific discovery. For example, it is common for biotechnology 

15 companies to engage in various discovery processes (e.g. drug "target" 

discovery processes) whereby they retain downstream intellectual property 
rights and/or royalty streams. It is also common for biopharmaceutical 
companies to collaborate with pharmaceutical companies for purposes of drug 
discovery, wherein the biopharmaceutical companies use one of several 

20 methods to identify regions of the genome that play a role in a particular 
disease. 

[0003] The DNA that makes up human chromosomes provides the 
instructions that direct the production of all proteins in the body. These proteins 
carry out vital functions of life. Variations in DNA are directly related to 

25 almost all human diseases, including infectious diseases, cancers, inherited 
disorders, and autoimmune disorders. Variations in DNA attributing to a 
phenotypic change, such as a disease or a disorder, may result from a single 
variation that disrupts the complex interactions of several genes or from any' 
number of mutations within a single gene. For example, Type I and II diabetes 

30 have been linked to multiple genes, each with its own pattern of mutations. In 
contrast, cystic fibrosis can be caused by any one of over 300 different 
mutations in a single gene. Phenotypic changes may also result from variations 
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in non-coding regions of the genome. For example, a single nucleotide 
variation in a regulatory region can upregulate or downregulate gene expression 
or alter gene activity. 

[0004] Recent technological developments in the field of human genomic 
5 have enabled the development of pharmacogenomics, the use of human DNA 
sequence variability in the development and prescription of drugs. 
Pharmacogenomics is based on the correlation or association between a given 
genotype and a resulting phenotype. Since the first correlation study over half- 
a-century ago linking adverse drug response with amino acid variations in two 
10 drug-metabolizing enzymes (plasma cholinesterase and glucose-6-phosphate 
dehydrogenase), other correlation studies have linked sequence polymorphisms 
with drug metabolism enzymes, drug targets and drug transporters with 
compromised levels of drug efficacy or safety. 

[0005] Pharmacogenomics information is especially useful in clinical 
15 settings where correlation information is used to prevent drug toxicities. For 
example, patients are often screened for genetic differences in the thiopurine 
methyltransferase gene that cause decreased metabolism of 6-mercaptopurine or 
azathiopurine. However, only a small percentage of observed drug toxicities 
have been explained adequately by the set of pharmacogenomic markers 
20 available to date. In addition, "outlier" individuals, or individuals experiencing 
unanticipated effects in clinical trials (when administered drugs that have 
previously been demonstrated to be both safe and efficacious), cause substantial 
delays in obtaining FDA drug approval and may even cause certain drugs to 
come off market, though such drugs may be efficacious for a majority of 
25 recipients. 

[0006] The various biotechnological methods used to date to identify 
target genomic regions include, for example, differential gene expression which 
essentially looks for differences in gene expression between control and case 
samples; protein-protein interaction maps which are used to identify drug 
30 receptors and their immediate effectors; and mining human sequence databases 
for sequences similar to known disease-related, pharmacokinetic or 
pharmacodynamic regulators. In comparison, association studies that correlate 
and validate genomic regions with a particular phenotypic trait rely on 
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population genetics and robust statistical metrics. Association studies provide a 
powerful tool to obtain greater amounts of information in a shorter amount of 
time thus reducing costs of research and development efforts. 
[0007] Because all humans are 99.9% identical in their genetic makeup, 
5 the DNA sequence of any two individuals is nearly identical. Variations 
between individuals include, for example, deletions or insertions of DNA 
sequences, variations in the number of repetitive DNA elements in non-coding 
regions and changes in a single nitrogenous base position, or "single nucleotide 
polymorphisms" (SNP). It is estimated that there are 3 to 4 million common 

10 SNPs that occur in at least 10 percent of people. These common SNPs do not 
occur independently but are inherited from generation to generation in tandem 
with other SNPs, forming patterns across the genome. Such groups of SNPs are 
referred to as SNP haplotype blocks, or simply haplotypes, herein. 
[0008] Common SNPs are useful for conducting whole-genome 

1 5 association studies. Whole genomes are scanned of individuals, with and 

without a phenotypic trait (e.g., resistance to a disease, toxicity from a drug), 
and correlation is made between SNPs of the case group and a particular 
phenotypic state. Such whole-genome analyses provide a fine degree of genetic 
mapping and can pinpoint to specific regions of linkage. Methods for whole 

20 genome analysis are described in U.S. Ser. No. 60/327,006, filed October 5, 
2001, "Identifying Human SNP Haplotypes, Informative SNPs and Uses 
Thereof," assigned to the assignee of the present invention (Attorney Docket 
No. 054801-5001-P2; 1005-P3) and U.S. Ser. No. 10/106,097 'Methods For 
Genomic Analysis" (Attorney Docket No. 1005-U1), both incorporated herein 

25 by reference for all purposes. Further, the identity of SNPs and SNP haplotype 
blocks across one representative chromosome, e.g. Chromosome 21, are 
disclosed in U.S. Ser. No., 60/323,059 filed September 18, 2001, "Human 
Genomic Polymorphisms" assigned to the assignee of the present invention 
(Attorney Docket No. 1013P-1) and U.S. Ser. No. 10/284,444 entitled "Human 

30 Genomic Polymorphisms" (Attorney Docket No. 1013U-2), incorporated herein 
by reference for all purposes. See also Patil, N. et al, "Blocks of Limited 
Haplotype Diversity Revealed by High-Resolution Scanning of Human 
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Chromosome 21" Science 294, 1719-1723 (2001), disclosing SNPs and 
haplotype structure of Chromosome 21. 

[0009] It is desirable to establish new and useful business methods to 
capitalize on these technological and scientific developments in genetics. 

BRIEF SUMMARY 

[00010] Improved business methods, devices, and systems for utilizing 
genetic information are provided. 

[00011] According to one embodiment of the invention, a business method 
is disclosed that includes the steps of scanning the human genome for single 
nucleotide polymorphisms across more than 10,000,000 bases including both 
genie and non-genic regions; grouping the single nucleotide polymorphisms 
into haplotype blocks; using the haplotype blocks in association studies with a 
phenotypic state; using associations in a discovery process; and marketing 
products from the discovery process. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[00012] Figure 1 is a flow chart illustrating aspects of the business method 
herein. 

[00013] Figure 2 is a flow chart illustrating an automotive aspect of the 
business method herein. 

DETAILED DESCRIPTION 

[00014] As used in the specification, "a" or "an" means one or more. As 
used in the claim(s), when used in conjunction with the word "comprising", the 
words "a" or "an" mean one or more. As used herein, "another" means at least 
a second or more. 

[00015] Reference now will be made in detail to various embodiments 
and particular applications of the invention. While the invention will be 
described in conjunction with the various embodiments and applications, it will 
be understood that such embodiments and applications are not intended to limit 
the invention. On the contrary, the invention is intended to cover alternatives, 
modifications and equivalents that may be included within the spirit and scope 
of the invention. In addition, throughout this disclosure various patents, patent 
applications, websites and publications are referenced. Unless otherwise 
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indicated, each is incorporated by reference in its entirety for all purposes. 
Processes that may be used in specific embodiments of the methods herein are 
described in more detail in the following patent applications, all of which are 
specifically incorporated herein by reference: United States provisional patent 
5 application Ser. No. 60/280,530, filed March 30, 2001; United States 

provisional patent application Ser. No. 60/3 13,264 filed August 17, 2001; 
United States provisional patent application Ser. No. 60/327,006, filed October 
5, 2001, all entitled "Identifying Human SNP Haplotypes, Informative SNPs 
and Uses Thereof; U.S. provisional patent application Ser. No. 06/332,550, 

10 filed November 26, 2002, (Attorney Docket No. 1005P-4), entitled "Methods 
for Genomic Analysis"; U.S. Pat. Ser. No. 10/106,097, filed March 26, 2002, 
(Attorney Docket No. 1005N-1), entitled "Methods for Genomic Analysis"; 
U.S. Ser No. 10/042,819, filed January 7, 2002, entitled "Genetic Analysis 
Systems and Methods"; and U.S. Ser. No. 10/284,444, filed September 18, 

15 2001, entitled "Human Genomic Polymorphisms", the disclosures all of which 
are specifically incorporated herein by reference. 

[00016] All publications mentioned herein are cited for the purpose of 
describing and disclosing reagents, methodologies and concepts with the present 
invention. Nothing herein is to be construed as an admission that these 
20 references are prior art in relation to the inventions described herein. 
I. General. 

[00017] Scientists have completed the first draft of the genetic sequence 
of a human being, marking the beginning of a new era in biological research. 
Although this in itself is a significant accomplishment, much of the business 
25 value is in discovering patterns of variation between individuals and correlating 
specific genomic regions with a phenotypic trait. 

[00018] Sequencing the human genome has revealed that there is a high 
degree of homology in genetic information between humans - any two humans 
share approximately 99.9 percent the same DNA sequence and have the same 
30 up to 100,000 or so genes similarly situated in one of twenty-three 

chromosomes. However, differences still exist. Approximately 0.1 percent, or 
one out of every 1,000 DNA letters is different between any two humans. This 
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accounts for an estimated of 3 to 4 million common differences (or SNPs) 
between individuals and many more rare ones. 

[00019] Most diseases have a genetic basis that is often the result of 
numerous genetic factors rather than just one gene. Therefore, when scientists 
5 want to understand the genetic causes and resulting biological pathways 
involved in disorders such as Alzheimer's, cancer or asthma, they need to 
compare the DNA variations of the entire genome of many individuals who 
have the disease to the genomes of many people who do not have the disease. 
[00020] Further into the drug commercialization process, drug companies 

10 invest hundreds of millions of dollars to develop a new product, only to suffer 
large losses due to clinical trial participants having unpredictable effects (e.g., 
increased toxicity or inadequate or no response to the tested drug). In order to 
overcome negative results, obtain regulatory approval faster and recoup losses, 
drug companies need to associate effects with genetic profiles of clinical trial 

1 5 participants. It would be very advantageous for drug companies to be able to 

predict which individuals in a population will tolerate or respond positively to a 
tested drug, and/or which individuals will experience negative side effects or no 
significant improvement from the drug. 

[00021] By scanning an initial set of genomes, the business methods 
20 herein identify common SNPs and common "haplotypes" patterns in which 
"SNPs" occur. Once these haplotypes are known, assays can be used to 
determine the genetic profiles of many individuals by reading only a few SNPs 
from each known haplotype block or by reading all SNPs from each known 
haplotype and using the haplotype blocks to ensure data integrity. The results 
25 from these assays can be used, directly or indirectly, in drug discovery, clinical 
trials and other discovery efforts with partners. The results can save partners 
milli ons of dollars in unsuccessful clinical trial and fruitless research and 
development efforts. Partners can include, for example, biotech partners, 
pharmaceutical partners, consumer products partners (e.g., in the discovery of 
30 shampoos, skin care products and others), agricultural partners, and/or other 
partners (e.g., scientific partners or government partners). 
[00022] The tremendous market advantage obtained by pharmaceutical 
companies Xor other companies) as a result of association studies is one of the 
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bases for the business methods and systems herein. In some cases, instead of 
evaluating all 3 billion bases from each genome, or even the 3 to 4 million 
common SNPs, it is possible to evaluate as few as 300,000 to 500,000 SNPs or 
even less in order to make a correlation with the whole genome of an organism. 
5 The haplotype patterns detected by reading these particular SNPs provide 
enough information to allow statistically accurate association data to be 
extracted from specific populations. Alternatively, if one identifies a large 
percentage of common SNPs in a genetic region in an association study, the 
haplotype blocks may be used to verify genotyping results. Pharmaceutical 
10 partners, for example, may then pay for the association of human genetic 
profiles with disease symptoms, drug responses, or other phenotypic states. 
Agricultural partners, for example, may pay for the association of genetic 
profiles of various crops with pest resistance, better quality produce, better 
yields or other phenotypic states. Veterinary partners will pay for association of 

i. 

1 5 genetic profiles of various animals with drug resistance, improved performance, 
increased virility or other phenotypic traits. 

[00023] Furthermore, one can acquire the rights to drugs that others place 
at reduced (such as those that failed to gain regulatory approval, failed to meet 
primary efficacy endpoints in late-stage clinical trials, have serious adverse 
20 affects, or are not first-line therapy) and apply this technology herein to 

determine which patients are best suited for the drug; thereby, significantly 
increasing the drug's value. 

[00024] With initial funding from partners and others, the present method 
provides for a comparison of human DNA with the DNA of various animals 
25 and/or other organisms to identify non-genic regions of DNA conserved 

between species through evolution. These funds may then be used to conduct 
discovery processes without partners for development of an internal diagnostic 
and drug pipeline of products. 
PL Scientific Basis. 

30 [00025] The human body contains about 100 trillion cells. Inside each 
cell is a center called a nucleus. Inside the nucleus are two sets of twenty-three 
human chromosomes that contain genetic material. One set of chromosomes is 
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inherited from the mother and the other from the father. Each set includes 
similar genetic information in similar chromosomal location. 
[00026] In practice, however, there are subtle differences between 
paternal and maternal versions of each genome. In fact, except for identical 
5 twins, every individual's genome has millions of subtle differences from other 
individual genomes. These subtle differences account for most of the 
differences between individuals, for example, eye and skin color. In addition, 
they can be used to determine an individual's predisposition to diseases, 
response to drugs, reaction to the environment, and even in some cases, 
10 behavior. 

[00027] The human genome is gigantic. It is often compared to a book 
written in a four-letter alphabet using a total of three billion letters, or about one 
billion words. This makes one person's genome as long as 800 Bibles. If one 
were to read the genome out loud at the rate of one word per second for twenty- 
1 5 four hours a day, it would take a century. It is an immense document, yet it fits 
inside the microscopic nucleus of a tiny cell that fits easily upon the point of a 
pin. 

[00028] Just like a book, the human genome is written in linear, one- 
dimensional form and is defined by a code that transliterates the four-letter 

20 alphabet into a large lexicon of meanings through the order of their groupings. 
However, whereas English books are written in words of variable lengths using 
twenty-six letters, genomes are written primarily in three-letter words, using 
only four letters: adenine, cytosine, guanine and thymine (or in shorthand, A, C, 
G and T, respectively). Furthermore, instead of being written on flat pages, 

25 they are written in long chains or polymers of alternating phosphate and 

deoxyribose sugar to which the letters or "bases" are attached as side rings. 
Each human chromosome is composed of a condensed double helix DNA 
polymer with a complementary DNA polymer strand. 

[00029] An individual' s genomic DNA affects numerous facets of life by 
30 providing, for example, the instructions that direct the production of all proteins 
in the body. Misspellings or "mutations" in DNA produce mistakes in the 
proteins that are produced by cells, thus affecting the normal function of the 
cells, or in regulatory elements of the cells. Although the environment often 
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plays a significant role, variations or mutations in DNA are directly related to 
almost all human diseases, including infectious diseases, cancers, inherited 
disorders, and autoimmune disorders. Variations in DNA are also responsible 
for certain "protective" traits, such as a stronger immune system, more rapid 
5 clearing of toxins from the organism, or extended lifespan. Being able to 
compare the DNA of many individuals and associating those variations to 
clinical symptoms provides a powerful tool for understanding, diagnosis and 
treatment of a disease or promotion of a health state. 
[00030] Knowledge of human genetics has led to the realization that 

10 many diseases result from either complex interactions of several genes and/or 
their gene products, from any number of variations within one gene and its 
resulting gene product, or from any number of variations within the non-coding 
regulatory regions of genes. For example, Type I and II diabetes have been 
linked to multiple genes, each with its own pattern of variations. 

1 5 [00031] The complexity of the genetic basis of disease highlights the 

need for new technology capable of looking across the entire genome to analyze 
large numbers of variations. Scanning entire genomes, or genomic DNA or 
derivatives thereof, is one aspect of the business systems and methods disclosed 
herein. 

20 [00032] In one embodiment of the implementation of the systems and 
methods disclosed herein provides for the separation of the full sets of 
chromosomes from individuals (such as more than 10, preferably more than 20, 
more preferably more than 25 and even more preferably more than 50 
individual genomes) such that there are multiple unique genomes. Preferably, 

25 haploid genomes (or genomes derived from a single set of chromosomes) are 
used. Techniques for employing haploid genomes are disclosed in U.S. Ser. No. 
10/106,097. The bases of all or a significant part of these genomes are then 
scanned or sequenced using, for example, conventional DNA sequencers or 
chip-based technologies. In a preferred embodiment, whole-wafer technology 

30 from Affymetrix, Inc. of Santa Clara, California is used to read each of the 
genomes at single-base resolution. DNA sequence data generated from each 
genome is then compared with the other genomes in order to discover all or 
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many of the variations among the genomes. For example, individual 2 in Table 
1 below has two variations as compared with individuals 1 and 3: 



Table 1 



Individual 1 : 


T 


A 


G 


T 


C 


G 


Individual 2: 


T 


A 


A 


T 


C 


C 


Individual3: 


T 


A 


G 


T 


C 


G 



5 [00033] Because all humans are 99. 9% similar in their genetic makeup, 
most of the letters, or bases, of their genomes will be identical. One would 
expect to discover up to about 3 to 4 million such common variations between 
50 scanned genomes, with "common" variations defined in some cases as those 
variations that occur in at least 10 percent of people. 

10 [00034] Most single nucleotide polymorphisms or "SNPs" do not occur 
independently but are passed from generation to generation in variable-length 
blocks of multiple SNPs, forming patterns across the genome that form an 
individual's genetic profile. These patterns are referred to as "haplotypes." In 
scanning the first, for example 20 to 50 genomes, it will be possible to identify 

15 common haplotypes in which SNPs occur. After identifying these haplotypes it 
becomes possible to determine the sequence of individuals by reading only a 
few SNPs from each known block of SNPs. This approach has a tremendous 
advantage in conducting genetic association studies for pharmaceutical partners, 
because it utilizes a reduced set of specific SNPs that are indicators for the state 

20 of the entire genome. Instead of reading all 3 billion bases from each genome, 
or even the 3 to 4 million common SNPs that may be found, one may read, for 
example, only 300,000 to 500,000 informative SNPs, once they are discovered. 
The haplotype patterns seen by reading these particular SNPs allow statistically 
accurate association data to be extracted from specific clinical populations. 

25 Bven when not used to generate association study data, haplotypes are useful for 
validating association results. For example, after pooled samples are scanned 
and correlated with a phenotypic trait, individuals can be genotyped for 
presence or absence of SNPs of haplotypes that are found to be of potential 
significance in the pooled analysis. The allele SNPs can be used to "check" the 

30 genotype of other SNPs of the same haplotype block. 
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(00035] According to one aspect of the invention, glass wafers on which 
high-density arrays of nucleic acid probes have been placed are used. Each of 
these wafers holds, for example, approximately 60 million nucleic acid probes 
that can be used to recognize longer nucleic acid sequences in a sample. The 
5 recognition of sample nucleic acids by the set of nucleic acid probes on the 
glass wafer takes place through the mechanism of hybridization. When a 
sample nucleic acid hybridizes with an array of nucleic acid probes, the sample 
will bind to those probes that are complementary to sample nucleic acid 
sequence. By evaluating the level of hybridization of different probes to the 

10 sample nucleic acid, it is possible to determine whether a known sequence of 
nucleic acid is present or absent in the sample. 
[00036] The use of probe arrays or wafers to decipher genetic 
information involves the following steps: design and manufacture of probe 
arrays or wafers, preparation of the sample, hybridization of target nucleic acids 

15 to the array, detection of hybridization events and data analysis to determine 
sequence. The preferred wafers or probe arrays are manufactured using a 
process adapted from semiconductor manufacturing to achieve cost 
effectiveness and high quality, as for example, those manufacture by 
Affymetrix, Inc. 

20 [00037] The design of the wafers or nucleic acid probe arrays begins by 
probe selection. The probe selection algorithms are based on ability to 
hybridize to the particular nucleic acid sequence to be scanned. With this 
information, computer algorithms are used to design photolithographic masks 
for use in manufacturing the probe arrays. 

25 [00038] Probe arrays are preferably manufactured by light-directed 
chemical synthesis process, which combines solid-phase chemical synthesis 
with photolithographic fabrication techniques employed in the semiconductor 
industry. Using a series of photolithographic masks to define chip exposure 
sites, followed by specific chemical synthesis steps, the process constructs high- 

30 density arrays of oligonucleotides, with each probe in a predefined position in 
the array. Multiple probe arrays are synthesized simultaneously on a large glass 
wafer. This parallel process enhances reproducibility and helps achieve 
economies of scale. 
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[00039] Once fabricated the wafers or nucleic acid probe arrays are ready 
for hybridization. The nucleic acids to be analyzed (the target) is isolated, 
amplified and labeled with a fluorescent reporter group. The labeled target is 
then incubated with the array using a fluidics station and hybridization oven. 

5 After the hybridization reaction is complete, the array is inserted into the 

scanner, where patterns of hybridization are detected. The hybridization data are 
collected.as light emitted from the fluorescent reporter groups already 
incorporated into the target, which is now bound to the probe array. Probes that 
most clearly match the target produce stronger signals than those that have 

10 mismatches. Since the sequence and position of each probe on the array are 

known, by complementarity, the identity of the target nucleic acid applied to the 
probe array can be identified. 

[00040] The data and information generated by the scientific assays 
disclosed is valuable to numerous industries. For example, information 

15 concerning potential drug targets is highly valuable to the biotech industry and 
can greatly speed up the drug discovery process, and hence time-to-market. 
Similarly, information concerning the effectiveness and efficiency of given drug 
is extremely valuable to the pharmaceutical industry and can save a company 
substantial money in lost revenue due to Mures in clinical trials. The 

20 information generated herein may also be valuable to the agricultural industry, 
veterinary medicine industry, consumer products industry, insurance and 
healthcare provider industry and forest management (by providing genetic basis 
for useful traits in plants, trees, laboratory animals and domestic animals) for 
example. 

25 PI. Business Applications 

[00041] The drug research and development process includes everything 
from the discovery of target genomic regions to drug discovery and final 
product launch. This process is currently very lengthy, expensive and risky. On 
average, it takes fourteen years to develop a product from the initial research 

30 laboratory period to FDA approval. Any event that delays the 

commercialization or development process of a potential drug can cost the 
affected company a loss of revenue of up to $1 billion annually. Conversely, 
any change that can accelerate commercialization or development cycle of a 
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potential drug can bring significant financial benefits to the affected company 
that implements such changes. 

[00042] Accelerated time-to-market not only brings the benefit of earlier 
sales revenues but the expanded market share enjoyed by companies that are the 
5 first to enter a segment before its competitors. This is critical, because the 

period of market exclusivity for the first drug in a new market therapeutic class 
is typically much shorter than would be desirable. Consequently, marketing 
expenditures have increased rapidly as companies attempt to maintain or 
increase market share. 

10 [00043] In addition to the time-to-market factors, the odds of any 

compound successfully making it through all of the steps across fourteen years 
are miniscule. Statistically, out of 5,000 compounds that begin in pre-clinical 
development, only five make it to clinical trials, and only one is likely to reach 
the market. The combination of long development cycles and high failure rates 

1 5 results in an average cost of approximately $500 million for making a 

successful EDA approved compound. Therefore, business systems and methods 
that improve the efficiency and timeliness of regulatory approval are greatly 
valuable. 

[00044] Pharmaceutical companies have recognized the need to improve 
20 research and development efficiency by utilizing genomics in their drug 

discovery programs. This effort is necessary for companies to match historical 
revenue growth levels and to meet shareholders' expectations. The drive by 
pharmaceutical companies for efficiency provides an opportunity for application 
of genome-wide scanning technologies during both the research and clinical 
25 development cycle. 

[00045] One example of the application of the business systems and 
methods herein can be found in population segmentation. It is generally 
acknowledged that most drugs work more effectively for some patients than 
others. Because this variability in patient response is often poorly understood, 
30 pharmaceutical companies may unnecessarily discontinue further drug 

development, fail to obtain regulatory approvals for promising drug candidates, 
or if approvals are obtained, be unable to market an approved drug effectively 
or to obtain approval for third party reimbursement. 
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[00046] Genomic differences have long been recognized as influencing 
how patients respond to drugs. However, pharmaceutical companies generally 
have not considered genomic differences between patients in developing and 
implementing clinical trials or in the marketing of approved drugs. By 
5 correlating genomic variation with drug response in clinical trials, it is possible 
to improve the drug development and marketing process. For example, 
pharmaceutical companies could use the correlation data from earlier stages of 
clinical trials to make more informed decisions on whether or not to continue 
trials, enter later-phases of trials or which patients to enroll in later-stages (e.g., 

10 phase III or IV). For example, enrolling patients with genetic predisposition for 
positive drug response can improve the therapeutic index for these patients and 
improve the possibility of regulatory approval. Furthermore, understanding the 
correlation between genomic differences and drug response can enable 
pharmaceutical companies to improve drug marketing by identifying segments 

15 of the population for whom particular drugs are likely to be more effective than 
other drugs, and encouraging physicians to preferentially prescribe such drugs 
to these patients. Marketing to physicians can be accomplished by continuing 
medical education, peer-review journals, Internet, print advertising or direct sale 
calls. In addition, by using the information disclosed herein a company can 

20 better market a drug by segregating a responder population from a non- 

responder population, or by segregating a population that encounters negative 
side effects (or even toxicity) from a population that does not suffer negative 
effects. This may further allow a company to keep a drug on the market that 
would otherwise be withdrawn or to reintroduce a drug that has already been 

25 withdrawn due to adverse effects. The information generated can also be used 
to create diagnostic kits. 

[00047] Drugs are typically developed to interact with a single version of 
a gene product, e.g., protein or receptor in the human body. A drug may 
therefore, for example, only be effective in individuals that have a particular 
30 variation encoding the specific protein or receptor for which the drug was 

designed. Individuals, who do not have a genetically caused variation in these 
regions or in regions involved in the metabolism of the drug, may not respond 
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to the drug or may experience adverse side effects, such as increased toxicity for 
example. 

[00048] The methods used by the pharmaceutical industry to develop 
new drugs and to improve existing drugs may be changed when genetic 
5 variations are taken into account. Genetic variations may play a significant role 
in all stages of research and development and drug discovery. Genetic variation 
information can also be used to improve drugs already on the market by 
providing information to better select drugs for a particular patient. 
[00049] To further illustrate the difficulties solved herein, drugs can 

10 interact, directly and/or indirectly, with a variety of different proteins that are 
encoded and regulated by different genomic regions. Therefore, more than one 
genomic region can determine how an individual responds to a given drug. The 
inventions herein can be used to identify such multiple regions. As genetic 
variations are better understood, it is clear that an individual's response to a 

1 5 given drug is dependent upon that individual' s unique genome or more 
specifically variations within the genome. 

[00050] A practical approach to understanding why different individuals 
respond differently to the same drug is found in grouping individuals together 
based upon specific genomic similarities or similar haplotype patterns. These 

20 genomic similarities can occur between unrelated individuals from different 

ethnic groups and/or from different geographic regions. The ability to identify 
and associate genetic variations with a phenotypic state (e.g., disease and drug 
responses) across the entire genome can facilitate the entire drug development 
process and can reduce the time-to-market for therapeutics. For example, 

25 genetic profiles of select subsets of patient populations may be used to enable 
pharmaceutical companies to identify drug targets, focus on potentially better 
leads and move quicker into screening assays. In addition, better drug targets 
can also provide for safer, more effective points of therapeutic intervention. 
[00051] Markets that may be addressed by the business systems and 

30 methods disclosed herein include, but are not limited to, evaluation of genetic 
variations and drug response, evaluation of genetic variations to identify and 
validate target regions, evaluation of variation and susceptibility to disease, 
identification of conserved non-coding regions that may contain gene regulatory 
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sequences, evaluation of genetic variations and regulatory regions affecting 
development, and evaluation of other genotype-phenotype associations with 
commercial potential, such as in consumer products and agriculture. Potential 
customers or partners for genome-wide pattern information, conserved region 
5 information, patient profiling services and other scientific partnerships include, 
for example, numerous companies in the pharmaceutical, biotechnology and 
agricultural industries, as well as academic centers and government research 
institutes. 

[00052] Other potential customers or partners for association studies 

10 include, for example, healthcare providers, insurance companies, government 
entities (e.g., Medicaid, Medicare) and enqrioyers or any other entity interested 
in achieving more economical or effective system for providing or paying for 
medical or life insurance. Such parties can utilize association studies, for 
example, to selectively approve expensive drugs to patients who are correlated 

1 5 with a susceptibility to an adverse reaction from a generic drug, evaluate better 
an individual' s likelihood to suffer from disease (or die) prior to underwriting 
them and selecting more effectively health and life insurance premiums for 
individuals. These parties may provide funding and/or sample sources for the 
association studies herein. 

20 [00053] The business systems and methods herein further include, for 
example, the development of DNA-scanning and wafer technology and use of 
that technology's genome scanning capabilities for identifying commercially 
valuable genetic regions through research collaboration, and verifying such 
results using associations studies. Up-front fees, research payments, milestone 

25 payments, database subscriptions, product sales and royalties may all contribute 
revenue to the business modeL 

[00054] In the short-term, the business strategies and methods herein can 
generate revenues through several means. First, revenue can be generated by 
providing genomic data obtained and analyzed from large scale scanning of 
30 genomes. Such data can further be used, for example, for genotyping and 
association studies and can further be licensed to biotechnology, 
pharmaceutical, or other interested parties on a non-exclusive basis. In addition 
or alternatively, revenue can be generated by entering into discovery contracts 
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on an exclusive or non-exclusive basis with biotech, pharmaceutical, or other 
companies that are interested in specific areas of the genome, or specific disease 
areas across portions or all of the genome, even before scanning the first 
genomes. Any of the methods herein may be used to verify (or remove) 
5 existing drug target candidates. Furthermore, the business methods herein can 
be used to identify SNPs and haplotypes thereof of regions conserved with other 
species. This can be used for cross species studies. 
[00055] In the mid-term, the business strategies herein encompass 
generating revenue through at least five means. This can add or replace other 
10 sources of revenue. First, collaboration agreements can be entered into to 
provide genome-wide sequencing and genetic profiling (association studies) 
services that enable biotechnology, pharmaceutical and/or other partners to 
analyze specific populations. The populations may comprise, for example, of 
individuals affected by a particular disease, participants in clinical trials or 
1 5 groups displaying a particular response to a drug or environmental stimulus. 
Portions of the intellectual property from such efforts may be retained by the 
company performing such studies. The company performing the studies may 
then further develop drug targets and/or diagnostic products based on the 
retained intellectual property. Second, contract and/or grant funding from non- 
20 profit grant-giving organizations such as the federal government may be used 
for either haplotype discovery or association studies. Third, drugs can be in- 
licensed where genome-wide association studies are expected to add significant 
value, and then out-licensed at a premium to other companies or in exchange for 
substantial milestones and royalties after identification of the genetic basis of, 
25 for example, a drag metabolism response. Fourth, pharmaceutical partners can 
contract for research funded by those pharmaceutical partners. Furthermore, as 
part of an inter-company cross-licensing agreement, chip or other platform 
suppliers will pay a royalty on sales to its customers of chips or other 
technologies containing content generated by the business. 
30 [00056] In the longer term, royalties from products (e.g., drugs and 
diagnostics) developed and commercialized can generate revenue. Such 
products include, for example, products previously in-licensed, products 
developed and/or commercialized independently using internal pharmaceutical 
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development efforts and products developed in collaboration with partner(s). 
The association studies linking particular genetic variations and clinical 
symptoms will be extremely valuable in identifying drug targets as well as 
optimizing existing therapies. Agreements with pharmaceutical and 
5 biotechnology companies entered will include royalty rights on products 
derived from this data 

[00057] Importantly, generated revenue can be used to conduct similar or 
different internal pharmaceutical development efforts that can potentially lead 
to the development of commercial products. 

10 IV. Business Flow 

[00058] Fig. 1 illustrates overall steps in the business systems and 
methods herein. As shown, at step 101 initial genomic samples are obtained 
Samples of clinical population can be obtained from, for example, hospitals or 
universities. Other partners or entities can also provide sample sources. In 

1 5 most embodiments, the sample sources will include genomic DNA or 

derivatives thereof from human sources although other organisms can be 
utilized as well. At this stage the samples need not be from sample or control 
groups. The samples are preferably collected from a diverse group, such as 
from different ethnic background or different geographic regions. Examples of 

20 different ethnic backgrounds include Asian, Middle-Eastern, African, Nordic, 
South Pacific, etc. Further sub-groups may also be compared. 
[00059] At step 1 10 the chromosomes are preferably separated to form 
individual chromosomes. Such separation utilizes, in a preferred embodiment, 
somatic cell hybrid technology as described in U.S. Ser. 1^ 10/106,097, 

25 claiming priority to U.S. Ser. No. 60/332,550, filed November 26, 2001, 
(Attorney Docket No. 1005P-4), entitled "Methods of Genomic Analysis," 
incorporated herein by reference for all purposes. 

s. 

[00060] At step 120 the genomes are scanned for variants from a baseline 
sequence in a reference database 130. Variants can include, for example, SNPs 
30 or contiguous stretches of genomic DNA (variant DNA stretches) comprising 
one or more SNPs. Stretches of genomic DNA can comprise of at least 1000 
contiguous bases, at least 5,000 contiguous bases, at least 10,000 contiguous 
bases, at least 50,000 contiguous bases, at least 100,000 contiguous bases or at 
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least 500,000 contiguous bases. Furthermore, variant DNA stretches can 
comprise of at least 1 SNP, at least 5 SNPs, at least 10 SNPs, at least 25 SNPs, 
at least 50 SNPs, or at least 100 SNPs. The reference database may be 
proprietary or a public database such as GenBank. Preferably, more than 10 
different sources are scanned (it being understood that a single organism 
contains the genetic code from two different sources). In more prefenred 
embodiments, more than 20 different sources are scanned, preferably more than 
25 different sources, more preferably more than 30 different sources, preferably 
more than 50 different sources, and more preferably more than 100 sources are 
scanned. As described in U.S. Ser. No. 10/042,819, entitled "Genetic Analysis 
Systems and Methods," filed on January 7, 2002, assigned to assignee, the 
scanning step may be a one-step or two-step process. In a two-step process 
each consecutive base in a genetic sequence is analyzed initially, and only those 
variants that are identified in the initial step are utilized for scanning in later 
grouping processes. The scanning step can utilize a number of technology 
platforms such as chips, capillary or gel based DNA sequencers, microtiter 
hybridization wells or others. The genomes that are scanned are preferably 
from genetically diverse groups such as, for example, different racial groups, 
ethnic groups or geographic locations. This is based on the assumption that 
evolutionarily older SNPs that have more global applicability are common 
SNPs. However, specific groups or sub-groups can also be scanned to identify 
variants that are more useful for those sub-populations. 
[00061] Preferably, the entire genetic code in the reference database 130 
is analyzed or scanned for variants. Preferably, at least the non-repeat regions 
of the reference database are analyzed to identify SNPs and preferably common 
SNPs. Preferably, gene and non-gene regions of the sequences in the database 
are analyzed. Preferably, more than 1,000,000 bases are analyzed, preferably 
more than 10,000,000, more preferably more than 100,000,000, more preferably 
more than 500,000,000, and more preferably more than 1,000,000,000 bases. 
Preferably, more than 50% of at least one chromosome is scanned in multiple 
samples, preferably more than two chromosomes, preferably more than five 
chromosomes, and most preferably all of the chromosomes in the particular 
organism under study. By "common" SNPs, it is intended to refer to SNPs 
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occurring in more than 2% of the sample population studied, preferably more 
than 4% of the population studied, preferably more than 6% of the population 
studied, preferably more than 8% of the population studied, and more preferably 
more than 10% of the population studied. Of course, rare SNPs can also be 
5 utilized in particular association studies at step 125 for commercialization of 
diagnostics and therapeutics and at step 135 as an adjunct to the other methods 
herein "Rare" SNPs are non-common SNPs. Furthermore, the wafers or arrays 
used herein may also be used to compare genomes of, e.g. humans with other 
organisms such as mouse, rat or dog to, e.g. identify conserved regions as step 
10 123. 

[00062] The SNPs, haplotype blocks and other variant information such 
as variant DNA stretches can be stored in a database 121, which may be, in 
whole or in part, licensed directly for revenue. In addition, in one embodiment, 
the common variants identified at step 120 are placed in haplotype blocks. This 

15 can be accomplished, for example, using the methods disclosed in U.S. Ser. 

Nos. 60/280,530, filed March 30, 2001; U.S. Ser. No. 60/3 13,264, filed August 
17, 2001; U.S. Ser. No. 60/327,006, filed October 5, 2001; U.S. Ser. No. 
60/332,550, filed November 26, 2001; and U.S. Ser. No. 10/106,097, filed 
March 26, 2002, previously incorporated herein by reference. Representative 

20 variants and haplotype blocks from an entire human chromosome (chromosome 
21) are disclosed in, for example, in Patil, N. et al, "Blocks of Limited 
Haplotype Diversity Revealed by High-Resolution Scanning of Human 
Chromosome 21" Science 294, 1719-1723 (2001) and the associated 
supplemental materials, incorporated herein by reference. At step 142, 

25 representative SNPs or variant DNA stretches are selected from among the 
haplotype blocks to be used in association studies. 

[00063] The SNPs or variant DNA stretches selected at step 142 may be 
used in a variety of applications. For example, in collaboration with other 
companies, one may select portions of the genome of an organism at step 144 
30 that are believed to play a role in a particular phenotype such as a disease state. 
A partner may or may not provide the funding at step 146 and/or a sample set, 
for example, of disease and control samples, at step 148. 
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[00064] At step 150 SNP's, representative SNPs or variant DNA 
stretches from each sample are used in one or more assays to determine which 
portions of the tentatively selected functional region correlate in fact with the 
phenotype of interest. Such assays include, for example, a chip based assay, 
5 Invader™ assay, Taqman™ assay, GeneEngine™ sequencer, Pyrosequencing™ 
assay, MassARRAY™ assay, BeadArray™ assay, or gel or capillary 
sequencing assay. An Invader™ assay is based on enzyme-substrate reaction 
for quantifying DNA and RNA by binding two short DNA probes to a target, 
enzymatically cleaving the target, binding a fluorescently labeled probe to the 

10 cleavage site and then cutting the probe bound DNA to detect a fluorescence 
signal. A GeneEngine™ assay feeds DNA through small channels and then 
scans the DNA using high-speed optics. A MassArray™ assay assembles 
sequence information using mass spectrometry by obtaining base-specific chain 
termination information using enzymatic reactions. The BeadArray™ assay 

1 5 utilizes light conducting fiber optic bundle substrates that are composed of 3- 
micron beads. Gel and capillary sequencing operations as well as other assays 
are well known to those skilled in the art. 

[00065] The funding for use in the business methods herein will normally 
take on multiple forms. For example, the funding may be by way of funding for 

20 costs of performing the particular study in question, in some cases with margin. 
In addition funding may be provided by way of milestone payments, for 
example, at the time the SNPs/haplotypes are identified in a region, at the time 
the association study is completed, at the time research results are confirmed, at 
the time clinical trials of various stages are started and/or completed, at the time 

25 drugs or diagnostics begin to be marketed and/or reach sales milestones, and/or 
royalties on sales of the relevant drug, diagnostic, or drug sold in conjunction 
with the population segregation diagnostic, or similar fees for other products 
such as agricultural products or consumer products. In addition, particular 
partners may provide funding by way of equity investment, and/or equity may 

30 be sold to investors. 

[00066] Alternatively, or in addition, at step 152 whole genome studies 
are performed whereby the SNPs or variant DNA stretches from all or 
substantial part of the genome, are correlated with a phenotypic state such as a 
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disease state, for example. This correlation may also be conducted through 
pooling samples to reduce overall costs or by genotyping individual samples. 
This step may also be performed in collaboration with others, who may provide 
samples at step 154 and/or funding at step 156. Of course, the sample source 
and the funding source may or may not be the same in each of the studies 
herein. In one example, the funding source may be a pharmaceutical company 
and the source of samples may be a hospital, academic research center or other 
company. In another example, the funding source and/or sample source may be 
from an insurance company, government entity or employer. 
[00067] In one embodiment, at step 197 rights to a drug or product are 
acquired priorto initiating association studies. Such drug or product may be, 
for example, one that has been pulled off the market due to unpredictable 
adverse effects in a small group of individuals or may be one that did not obtain 
regulatory approval due to a large number of individuals experiencing 
unanticipated effects in clinical trials. The company may then be able to use 
association studies to correlate between genomic differences and drug responses 
and create a diagnostic kit to identify patients who may benefit from the drug. 
Thus, by stratifying patients and identifying outlier individuals, a company may 
be able to both obtain regulatory approval and increase drastically the value of 
its acquired drugs. 

[00068] A number of activities can be based upon the results of steps 1 50 
and 152. For example, at step 158 diagnostic markers may be used to develop 
diagnostic tests that are indicative of a patient's tendency to a disease (or, for 
example, being a carrier of a disease variant). Based on the markers, the 
diagnostics may be developed and commercialized at step 160. The diagnostics 
may take on a number of forms such as immunoassays, chip based DNA assays, 
PCR assays, Taqman™ assays, sequencing based assays or the like. 
[00069] In addition, or in the alternative, at step 162 the correlation 
studies are used to select disease targets for drug development. Once a genetic 
locus or multiple loci in the genome are associated with a particular phenotypic 
trait, for example, a disease susceptibility locus, the gene(s) or regulatory 
element(s) responsible for the trait can be identified. These gene(s) or 
regulatory element(s) may then be used as therapeutic targets for the treatment 
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of the disease, as shown at step 164, or for commercialization, as shown at step 
166, independently or in collaboration with partners. 

[00070] In addition, or in the alternative, at step 168 the phenotypic trait 
of drug response is used to stratify patients into various groups. The groups 
5 may be, for example, those that respond to a drug versus those that do not 

respond, those that respond to a drug without toxic effects, versus those that are 
observed to have toxic effects. At step 170 the therapeutic may be marketed 
with an associated diagnostic that is capable of segregating those that will 
respond an acceptable manner to the drug from those that do not. 

10 [00071] Figure 2 illustrates another embodiment of the methods and 

systems herein. At step 201, a plurality of whole genomes is scanned to identify 
all variants. Table II, row 1 below, illustrates a contiguous stretch of genomic 
DNA with all identified variants. At this stage, the variants may include 
common SNPs, rare SNPs and variant DNA stretches. Each SNP is illustrated 

15 by a framed base. A first and second alleles are further illustrated 

[00072] Next at step 203, common variants and variant DNA stretches 
are identified, and at step 205 common variants are grouped into haplotype 
blocks using any one or more of the methods herein. Table II, row 2, illustrates 
common SNPs grouped into two haplotype blocks. Starting from the left of the 

20 genomic sequence, variants C, T, A in allele 1 and the respective allele variants 
T, C and G in allele 2 form a first haplotype block whose bases are illustrated in 
bold; next, variants C, C, A and their respective allele variants A, T, and G form 
a second haplotype block whose bases are illustrated in underline; furthermore, 
common variants T an C in allele 1 and G in allele 2 form a third haplotype 

25 block illustrated in bold and italic. These common SNPs and haplotype blocks 
can be used in pooled association studies as described below. 
[00073] After case and control samples sources are obtained at step 251 
and their genomes are separated at step 253 (using any of the methods disclosed 
herein), all control genomes and all case genomes are "pooled" together at step 

30 255. The pooled genomes are scanned at step 257 to identify all common 
variants or variant DNA stretches identified at step 205 using any of the 
methods herein. Table II, row 3, illustrates an example of results from scanning 
pooled case and control genomes. 
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[00074] After scanning both pooled cases and controls for common 
variants or variant DNA stretches, the relative allele frequencies of common 
variants are . analyzed at step 259. If the frequencies of two allele variants are 
substantially the same, such as, when each allele appears in approximately 50% 
5 of all samples, the allele variants cannot be used for correlation purposes and 

are excluded from further analyses. On the other hand, if the frequencies of two 
allele variants are not substantially similar, a common variant is identified 
which can be used to correlate a particular genotype with a phenotypic state of 
interest. Table n, row 4, illustrates three different relative allele frequencies. 

10 The variants on the left end of the genomic sequence (C, T, A, C, C and A in 
allele 1 and T, C, G, A, T and G in allele 2) are expressed at either very high or 
low frequencies (e.g., 1%, 2%, 98% or 99%) and therefore are variants (or 
encompassed within variant DNA stretches) that can be used for correlation. 
On the other hand, the variants on the right end of the genomic sequence (T and 

15 C in allele 1 ; and G and G in allele 2) are expressed in roughly half the genomes 
(e.g., 55% and 45%) and cannot be used for correlation; therefore, these variants 
are excluded from further analysis in the example in Table H 
[00075] The common variants and variant DNA stretches (C, T, A, C, C 
and A in allele 1 and T, C, G, A, T and G in allele 2) are then used in 

20 association studies to correlate genomic regions with a phenotypic state of 
interest at step 260. This is often referred to as a first round of association. 
Most often, only some of the common variants will be found to correlate with a 
phenotypic state. However, at this stage, even variants that show a relatively 
low degree of correlation are retained for further analysis. Table II, row 5, 

25 illustrates an example wherein only three of the six remaining variants are 

correlated to a phenotypic state of interest. Common variants or variant DNA 
stretches that are not correlated to a phenotypic state are excluded from further 
analysis. 

[00076] The variants (including variant DNA stretches) correlated in the 
30 first round of association can then be verified by scanning a subset of pooled 
samples for the highly correlated regions identified in the first round of 
association at step 263 (T, C, G in allele 1 and C, T, and A in allele 2). If a 
subset of pooled samples shows identical results after a second round of 
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association, the samples are retained for further analysis. In a more preferred 
embodiment, the correlated variants (or variant DNA stretches) are verified by 
genotyping individual samples at step 265. Genotyping may also be considered 
a second round of association. Table II, row 6, illustrates an example of 
5 verifying highly correlated variants or variant DNA stretches by genotyping 
individual case and control samples. In this example, samples 1 and 2 are case 
samples, while samples 3 and 4 are control samples. In sample 1, all variants 
are identical to those in the pooled genomes; therefore, sample 1 corroborates 
variants T, C, and G (or a genomic stretch encompassing those variants) are 

10 correlated with a phenotypic state and the sample is retained for further studies. 
Conversely, the variants in sample 2 (C, T and A) are not identical to those in 
the pooled case genomes; therefore, sample 2 can be either genotyped again (to 
verify results) or excluded from further analysis. By analogy, sample 3 
corroborates that variants C, T, and A are correlated with a control phenotypic 

15 state, and therefore this sample is retained for further investigation. On the 

other hand, sample 4 does not have all of the requisite variants to correlate to a 
control phenotypic state and is excluded from further studies. 
[00077] In addition to genotyping individual samples, the haplotype 
blocks identified at step 205 can also be used to verify highly correlated variants 

20 or variant DNA stretches from the first round of association. In particular, after 
genotyping a sample for highly correlated variants, haplotype blocks can be 
used to verify genotyping results, as further illustrated in Table n, row 7. In this 
example, haplotype block 1 is C-T-A and haplotype block 2 is T-C-G. Samples 
1 and 3 align with the haplotype blocks 1 and 2, respectively. Aligning with a 

25 haplotype block can confirm variants from first and/or second rounds of 
association. On the contrary, samples 2 and 4 do not perfectly aligned with a 
haplotype block and therefore cannot corroborate the pooled correlation study. 
Therefore, these samples are excluded from further analysis. 
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Table H 



1 


Scan whole genomes to identify variants: 
gGA@CT(^GCGgC(^ri@A.C@I@(aUele 1) 
@GA@CTG^GCG@CC@n§ACgl§ (allele 2) 


2 


Identify common variants and group into haplotype blocks: 
@GA@CTG@GCGgCCgTT@AC@T@ (allele 1) 
|GA@CTG§GCG0CCgri§AC@T@ (allele 2) 


3 


Scan "pooled" control and case genomes for all variants 
Case: @GA@CTG§GCG@CCgTT@AC@I@ 
Control: @GA@CTG@GCG@CC@riPAcQl^ 


4 


Identify More Common Variants By Relative Allele Frequency 
IC^APICTGlA^G^ (allele 
1) 

QGA^CTG^^GSC(^^T§^(^3lj^allele 
2) 


5 


Common variants: 

Case: @GA@CTG§GCGgcC@TTgACGTC 
Control: @GA@CTG@GCG@CC|T , lgACGTG 
Correlate common variants with phenotypic state 
Case: @GA@CTGgGCGCCCCTTAACGTC 
Control: @GA@CTG@GCGACCTTTGACGTG 


6 


Highly correlated variants are verified by genotyping 
Case: @GAgCTG§GCGCCCCTTGACGTC - 
Control: @GA|CTG@GCGACCTTTGACGTG 
Case Samples: 

Sample 1: @GAgCTGgGCGCCCCTTGACGTC 
Sample 2: @GA@CTG@GCGCCCCTTGACGTG 
Control Samples: 

Sample 3: @GA@CTG@GCGACCTTTGACGTG 

Sample 4: @GA@CTG@GCGACCTTTGACGTG j 
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7 


Corroborate Using Selected Haplotype Block 




Haplotype 1: gGA@CTG@ 




Haplotype 2: @GA@CTG§ 




Case Samples: 




Sample 1: @GA@CTG|g|GCGCCCCTTGACGTG 




Sample 2: gGA@CTG{}^GCCCCTTGACGTG 




Control Samples: 




Sample 3: ^GA@CTG@GCGACCTTTGACGTG 




Sample 4: PJA@CTGSGCGACCTTTGACGTG 



[00078] The results obtained from steps 260, 263, 265, and 269 can be 
used for one or more of the applications previously described herein, for 
example, the development of diagnostic markers, kits and reagents for 
5 identifying a phenotypic trait of interest, or the identification of target genomic 
regions for drug discovery. Any of these applications can be accomplished 
independently or in collaboration with partners. 

[00079] The genomic sequences identified by the methods of the present 
invention may be genie or nongenic sequences. The term identified "gene" is 

10 intended to mean the open reading frame encoding specific polypeptides, 

intronic regions, generally as well as adjacent 5 f and 3 1 non-coding nucleotide 
sequences involved in the regulation of expression of the gene up to about 1 0 kb 
beyond the coding region, but possibly further in either direction. The coding 
sequences (ORFs) of an identified gene may affect a phenotype state by 

15 affecting protein structure. Alternatively, the non-coding sequences of the 
identified gene or nongenic sequences may affect a phenotype state by 
impacting the level of expression or specificity of expression of a protein. 
[00080] Genomic sequences are studied generally by isolating the 
identified genomic sequence such that it is substantially free of other nucleic 

20 acid sequences that do not include the sequence of interest. The sequences are 
used in a variety of ways. For example, the nucleic acid sequence may be used 
to detect or quantify expression of a gene in a biological specimen. The manner 
in which one probes cells for the presence of particular nucleotide sequences is 
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well established in the literature and does not require elaboration here, see, e.g., 
Sambrook, et al, Molecular Q ninp- A L aboratory Manual (Cold Spring 
Harbor Laboratory, New York) (1989). 

[00081] Alternatively, individuals may be studied that are resistant to a 
5 particular disease, such as HIV. By understanding the genetic basis of disease 
resistance it may be possible to identify therapeutic and/or diagnostic targets. In 
addition, individuals may be selected for desirable traits such as, for example, 
healthy state, enhanced performance state, protective state, extended longevity 
and resistance state. 

1 0 [00082] According to one aspect of the business systems and methods 
disclosed herein, when a region of the genome has been identified as playing a 
role in a phenotypic state, after a first, second or any subsequent rounds of 
association studies, the genome may be the subject of further SNP scanning at 
steps 120, 260, 263 or 265, for example to identify rare SNPs that may be 

1 5 associated with a disease. These SNPs may play a role in, for example, rare 
forms of the disease. 

[00083] In addition, the sequence of the gene (including flanking 
promoter regions and coding regions) may be mutated in various ways known in 
the art to generate targeted changes in promoter strength, or changes in the 

20 sequence of the encoded protein, etc. The sequence changes may be 

substitutions, translocations, insertions or deletions. Deletions may further 
include larger changes, such as deletions of a domain or exon. Techniques for 
in vitro mutagenesis of cloned genes are known. Examples of protocols for site- 
specific mutagenesis may be found in Gustin, et al, Biotechniques 14:22 

25 (1993); Barany, Gene 37: 1 1 1-23 (1985); Colicelli, et al, Mol Gen. Genet 
199:537-9 (1985); Prentki, et al, Gene 29:303-13 (1984); Sambrook, et al, 
Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Press) pp. 15.3- 
15.108 (1989); Weiner, et al, Gene 126:35-41 (1993); Sayers, et al, 
Biotechniques 13:592-6 (1992); Jones and Winistorfer, Biotechniques 12:528- 

30 30 (1992); and Barton, et al, Nucleic Acids Res. 18:7349-55 (1990). Such 
mutated genes may be used to study structure/function relationships of the 
protein product, or to alter the properties of the protein that affect its function or 
regulation. 
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- [00084] The identified gene may be employed for producing all or 
portions of the resulting polypeptide, or protein. To express a protein product, 
an expression cassette incorporating the identified gene may be employed. The 
expression cassette or vector generally provides a transcriptional and 
translational initiation region, which may be inducible or constitutive, where the 
coding region is operably linked under the transcriptional control of the 
transcriptional initiation region, and a transcriptional and translational 
termination region. These control regions may be native to the identified gene, 
or may be derived from exogenous sources. 

[00085] The polypeptide may be expressed in prokaryotes or eukaryotes 
in accordance with conventional methods, depending upon the purpose for 
expression. For large-scale production of the protein, a unicellular organism, 
such as E. coli, B. subtilis, S. cerevisiae, insect cells in combination with 
baculo virus vectors, or cells of a higher organism such as vertebrates, 
particularly mammals, e.g. COS 7 cells, may be used as the expression host 
cells. In many situations, it may be desirable to express the gene in eukaryotic 
cells, where the gene will benefit from native folding and post-translational 
modifications. Small peptides also can be synthesized in the laboratory. With 
the availability of the protein or fragments thereof in large amounts, the protein 
may be isolated and purified in accordance with conventional ways. A lysate 
may be prepared of the expression host and the protein purified using HPLC, 
exclusion chromatography, gel electrophoresis, affinity chromatography, or 
other purification technique. 

[00086] An expressed protein may be used for the production of 
antibodies, where short fragments induce the expression of antibodies specific 
for the particular polypeptide, and larger fragments or the entire protein allow 
for the production of antibodies over the surface of the polypeptide (polyclonal 
antibodies). Antibodies are prepared in accordance with conventional ways, 
where the expressed polypeptide or protein is used as an immunogen, by itself 
or conjugated to known immunogenic carriers, e.g. KLH, pre-S HBsAg, other 
viral or eukaryotic proteins, or the like. Various adjuvants may be employed, 
with a series of injections, as appropriate. For monoclonal antibodies, after one 
or more booster injections, the spleen is isolated, the lymphocytes are 
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immortalized by cell fusion and screened for high affinity antibody binding. 
The immortalized cells, hybridomas, producing the desired antibodies may 
then be expanded. For further description, see Monoclonal Antibodies: A 
Laboratory Manual. Harlow and Lane, eds. (Cold Spring Harbor Laboratories, 
5 Cold Spring Harbor, N.Y.) (1988). If desired, the mRNA encoding the heavy 
and light chains may be isolated and rautagenized by cloning in E. coli 9 and the 
heavy and light chains mixed to further enhance the affinity of the antibody. 
Alternatives to in vivo immunization as a method of raising antibodies include 
binding to phage "display" libraries, usually in conjunction with in vitro affinity 
10 maturation. 

[00087] The identified genes, gene fragments, or the encoded protein or 
protein fragments may be useful in gene therapy to treat degenerative and other 
disorders. For example, expression vectors may be used to introduce the 
identified gene (or variant thereof) into a cell. Such vectors generally have 

15 convenient restriction sites located near the promoter sequence to provide for 

the insertion of nucleic acid sequences. Transcription cassettes may be prepared 
comprising a transcription initiation region, the target gene or fragment thereof, 
and a transcriptional termination region. The transcription cassettes may be 
introduced into a variety of vectors, e.g. plasmid; retrovirus, e.g. lentivirus; 

20 adenovirus; and the like, where the vectors are able to be transiently or stably 
maintained in the cells. The gene or protein product may be introduced into 
tissues or host cells by any number of routes, including viral infection, 
microinjection, or fusion of vesicles. Jet injection may also be used for 
intramuscular administration, as described by Furth, et al t Anal Biockern, 

25 205:365-68 (1992). Alternatively, the DNA may be coated onto gold 

microparticles, and delivered intradermally by a particle bombardment device, 
or "gene gun" as described in the literature (see, for example, Tang, et al, 
Nature, 356:152-54 (1992)). 

[00088] Antisense molecules can be used to down-regulate expression of 
30 the identified gene in cells. The antisense reagent may be antisense 

oligonucleotides, particularly synthetic antisense oligonucleotides having 
chemical modifications, or nucleic acid constructs that express such antisense 
molecules as UNA. The antisense sequence may be complementary to the 
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mRNA of the targeted gene, and inhibits expression of the targeted gene 
products. Antisense molecules inhibit gene expression through various 
mechanisms, e.g. y by reducing the amount of mRNA available for translation, 
through activation of RNAse H, or by steric hindrance. One or a combination 
of antisense molecules may be administered, where a combination may 
comprise multiple different sequences. 

[00089] As an alternative to antisense inhibitors, catalytic nucleic acid 
compounds, e.g., ribo2ymes, anti-sense conjugates, etc., may be used to inhibit 
gene expression. Ribozymes may be synthesized in vitro and administered to 
the patient, or may be encoded on an expression vector, from which the 
ribozyme is synthesized in the targeted cell (for example, see International 
patent application WO 9523225, and Beigelman, et al, Nucl Acids Res. 
23:4434-42 (1995)). Examples of oligonucleotides with catalytic activity are 
described in WO 9506764. Conjugates of antisense oligonucleotides with a 
metal complex, e.g. terpyridylCu(H), capable of mediating mRNA hydrolysis 
are described in Bashkin, et al.Appl Biochem. Biotechnol 54:43-56 (1995). 
[00090] In addition to using the identified sequences for gene therapy, the 
identified nucleic acids can be used to generate genetically modified non-human 
animals to create animal models of diseases or to generate site-specific gene 
modifications in cell lines for the study of protein function or regulation. The 
term "transgenic" is intended to encompass genetically modified animals having 
an exogenous gene that is stably transmitted in the host cells where the gene 
may be altered in sequence to produce a modified protein, or having an 
exogenous promoter operably linked to a reporter gene. Transgenic animals 
may be made through homologous recombination, where the gene locus is 
altered. Alternatively, a nucleic acid construct is randomly integrated into the 
genome. Vectors for stable integration include plasmids, retroviruses and other 
animal viruses, YACs, and the like. Of interest are transgenic mammals, e.g., 
cows, pigs, goats, horses, etc., and, particularly, rodents, e.g., rats, mice, etc. 
[00091J Investigation of genetic function may also utilize non- 
mammalian models, particularly using those organisms that are biologically and 
genetically well-characterized, such as C elegans, D. melanogaster and S. 
cerevisiae. The subject gene sequences may be used to knock-out 
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corresponding gene function or to complement defined genetic lesions in order 
to determine the physiological and biochemical pathways involved in protein 
function. Drug screening may be performed in combination with 
complementation or knock-out studies, e.g., to study progression of 
5 degenerative disease, to test therapies, or for drug discovery. 

[00092] In addition, the modified cells or animals are useful in the study 
of protein function and regulation. For example, a series of small deletions 
and/or substitutions may be made in the identified gene to determine the role of 
different domains in enzymatic activity, cell transport or localization, etc. 

10 Specific constructs of interest include, but are not limited to, antisense 

constructs to block gene expression, expression of dominant negative genetic 
mutations, and over-expression of the identified gene. One may also provide 
for expression of the identified gene or variants thereof in cells or tissues where 
it is not normally expressed or at abnormal times of development. Furthermore, 

1 5 by providing expression of a protein in cells in which it is otherwise not 
normally produced, one can induce changes in cellular behavior. 
[00093] Protein molecules may be assayed to investigate 
structure/function parameters. For example, by providing for the production of 
large amounts of a protein product of an identified gene, one can identify 

20 ligands or substrates that bind to, modulate or mimic the action of that protein 
product. Drug screening identifies agents that provide, e.g., a replacement or 
enhancement for protein function in affected cells or for agents that modulate or 
negate protein function. The term "agent" as used herein describes any 
molecule, e.g. nucleic acid, protein or small molecule, with the capability of 

25 altering or mimicking the physiological function of an identified gene, gene 
regulatory region or gene product. Generally a plurality of assays is run in 
parallel with different agent concentrations to obtain a differential response to 
the various concentrations. Typically, one of these concentrations serves as a 
negative control, e.g., at zero concentration or below the level of detection. 

3 0 [00094] A wide variety of assays may be used for this purpose, including 
labeled in vitro protein-protein binding assays, protein-DNA binding assays, 
electrophoretic mobility shift assays, immunoassays for protein binding, and the 
like. Also, all or a fragment of the purified protein may also be used for 
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determination of three-dimensional crystal structure, which can be used for 
modeling intermolecular interactions, membrane fusion, etc. 
[00095] Candidate agents encompass numerous chemical classes, though 
typically they are organic molecules, preferably small organic molecules, 

5 having a molecular weight of more than 50 and less than about 2,500 daltons. 

Candidate agents comprise functional groups necessary for structural interaction 
with proteins, particularly hydrogen bonding, and typically include at least an 
amine, carbonyl, hydroxyl or carboxyl group, and frequently at least two of the 
functional chemical groups. The candidate agents often comprise cyclical 

10 , carbon or heterocyclic structures and/or aromatic or polyaromatic structures 

substituted with one or more of the above functional groups. Candidate agents 
are also found among biomolecules including, but not limited to: peptides, 
saccharides, fatty acids, steroids, purines, pyrimidines, benzodiazapines, 
derivatives, structural analogs or combinations thereof 

1 5 [00096] Candidate agents are obtained from a wide variety of sources 
including libraries of synthetic or natural compounds. For example, numerous 
means are available for random and directed synthesis of a wide variety of 
organic compounds and biomolecules, including expression of randomized 
oligonucleotides and oligopeptides. Alternatively, libraries of natural 

20 compounds in the form of bacterial, fungal, plant and animal extracts are 

available or readily produced. Additionally, natural or synthetically produced 
libraries and compounds are readily modified through conventional chemical, 
physical and biochemical means, and may be used to produce combinatorial 
libraries. Known pharmacological agents may be subjected to directed or 

25 random chemical modifications, such as acylation, alkylation, esterification, 
amidification, etc., to produce structural analogs. 

[00097] Where the screening assay is a binding assay, one or more of the 
molecules may be joined to a label, where the label can directly or indirectly 
provide a detectable signal. Various labels include radioisotopes, fluorescers, 
30 chemiluminescers, enzymes, specific binding molecules, particles, e.g., 

magnetic particles, and the like. Specific binding molecules include pairs, such 
as biotin and streptavidin, digoxin and antidigoxin, etc. For the specific binding 
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members, the complementary member would normally be labeled with a 
molecule that provides for detection, in accordance with known procedures. 
[00098] A variety of other reagents may be included in the screening 
assay. These include reagents like salts, neutral proteins, e.g. albumin, 
detergents, etc that are used to facilitate optimal protein-protein binding and/or 
reduce non-specific or background interactions. Reagents that improve the 
efficiency of the assay, such as protease inhibitors, nuclease inhibitors, anti- 
microbial agents, etc may be used. 

[00099] Agents, including any other compound or molecule disclosed, 
may be combined with a pharmaceutical^ acceptable carrier. Pharmaceutical 
carreirs include, for example, any and aU solvents, dispersion media, coatings, 
anti-oxidant, isotonic and absorption delaying agents and the like. The use of 
such carriers and agents in pharmaceutical compounds is well known in the art. 
Except insofar as any conventional carrier or agent is incompatible with the 
active ingredient, its use in the therapeutic compositions and methods described 
herein is contemplated Supplementary active ingredients can also be 
incorporated into the pharmaceutical compounds disclosed. 
[000100] Formulation of pharmaceutical compounds may be prepared for 
use in various methods for administration. The formulation may be given 
orally, by inhalation, or may be injected, e.g. intravascular, intratumor, 
subcutaneous, intraperitoneal, intramuscular, etc. The dosage of the therapeutic 
formulation will vary widely, depending upon the nature of the disease, the 
frequency of administration, the manner of administration, the clearance of the 
agent from the host, and the like. The initial dose may be larger, followed by 
smaller maintenance doses. The dose may be administered as infrequently as 
weekly or biweekly, or fractionated into smaller doses and administered daily, 
semi-weekly, etc., to maintain an effective dosage level. In some cases, oral 
administration will require a higher dose than if administered intravenously. 
Identified agents of the invention can be incorporated into a variety of 
formulations for therapeutic administration. More particularly, the complexes 
can be formulated into pharmaceutical compositions by combination with 
appropriate, pharmaceutical^ acceptable earners or diluents, and may be 
formulated into preparations in solid, semi-solid, liquid or gaseous forms, such 
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as tablets, capsules, powders, granules, ointments, solutions, suppositories, 
injections, inhalants, gels, microspheres, and aerosols. As such, administration 
of the agents can be achieved in various ways. Agents may be systemic after 
administration or may be localized by the use of an implant that acts to retain 
the active dose at the site of implantation. 

[000101] For oral preparations, an agent can be used alone or in 
combination witih appropriate additives to make tablets, powders, granules or 
capsules, for example, with conventional additives, such as lactose, mannitol, 
corn starch or potato starch; with binders, such as crystalline cellulose, cellulose 
derivatives, acacia, corn starch or gelatins; with disintegrators, such as corn 
starch, potato starch or sodium carboxymethylcellulose; with lubricants, such as 
talc or magnesium stearate; and if desired, with diluents, buffering agents, 
moistening agents, preservatives and flavoring agents. 
[000102] Additionally, agents may be formulated into preparations for 
injections by dissolving, suspending or emulsifying them in an aqueous or 
nonaqueous solvent, such as vegetable or other similar oils, synthetic aliphatic 
acid glycerides, esters of higher aliphatic acids or propylene glycol; and if 
desired, with conventional additives such as solubilizers, isotonic agents, 
suspending agents, emulsifying agents, stabilizers and preservatives. Further, 
agents may be utilized in aerosol formulation to be administered via inhalation. 
The compounds of the present invention can be formulated into pressurized 
acceptable propellants such as dichlorodifluoromethane, propane, nitrogen and 
the like. Furthermore, agents may be made into suppositories by mixing with a 
variety of bases such as emulsifying bases or water-soluble bases. 
Alternatively, identified agents of the present invention can be administered 
rectally via a suppository. The suppository can include vehicles such as cocoa 
butter, carbowaxes and polyethylene glycols, which melt at body ten^erature, 
yet are solid at room temperature. 

[000103] Implants for sustained release formulations are well known in the 
art. Implants are formulated as microspheres, slabs, etc. with biodegradable or 
non-biodegradable polymers. For example, polymers of lactic acid and/or 
glycolic acid form an erodible polymer that is well tolerated by the host. The 
implant containing identified agents of the present invention may be placed in 



WO 03/083442 PCT/US03/09832 

proximity to the site of action, so that the local concentration of active agent is 
increased relative to the rest of the body. Unit dosage forms for oral or rectal 
administration such as syrups, elixirs, and suspensions may be provided wherein 
each dosage unit, for example, teaspoonful, tablespoonful, gel capsule, tablet or 

5 suppository, contains a predetermined amount of the compositions of the 
present invention. Similarly, unit dosage forms for injection or intravenous 
administration may comprise the compound of the present invention in a 
composition as a solution in sterile water, normal saline or another 
pharmaceutically acceptable carrier. The specifications for the novel unit 

10 dosage forms of the present invention depend on the particular compound 

employed and the effect to be achieved, and the pharmacodynamics associated 
with each active agent in the host. 

[000104] The pharmaceutically acceptable excipients, such as vehicles, 
adjuvants, carriers or diluents, are readily available to the public. Moreover, 
1 5 pharmaceutically acceptable auxiliary substances, such as pH adjusting and 
buffering agents, tonicity adjusting agents, stabilizers, wetting agents and the 
like, are readily available to the public. 

[000105] A therapeutic dose of an identified agent is administered to a 
host suffering from a disease or disorder. Administration may be topical, 
20 localized or systemic, depending on the specific disease. The compounds are 
administered at an effective dosage that over a suitable period of time 
substantially arrests the disease progression. It is contemplated that the 
composition will be obtained and used under the guidance of a physician for in 
vivo use. 

25 [000106] The dose will vary depending on the specific agent and 

formulation utilized, type of disorder, patient status, etc., at a dose sufficient to 
address the disease or symptoms thereof, while minimizing side effects. 
Treatment may be for short periods of time, e.g., after trauma, or for extended 
periods of time, in the prevention or treatment of schizophrenia. 

30 [000107] In addition or in the alternative, at steps 1 68 and 268, the 

phenotypic trait of drug response is used to stratify, or segregate, patients into 
various groups. The groups may be, for example, those that respond to a drug 
versus those that do not respond, those that respond to a drug without toxic 
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effects, versus those that are observed to have toxic effects. At steps 170 and 
270, the therapeutic may be marketed with an associated diagnostic that is 
capable of segregating those that will respond in an acceptable manner to the 
drug from those that do not La a preferable embodiment, the rights to such 
drugs or other products are acquired in step 197 or 297, preferably prior to 
demonstration of ability to genetically stratify patients. Those rights, as well as 
the value-added genetic information on how to stratify patients, can then be 
licensed to a third-party for milestone payments and/or royalties. The drug or 
other products (with or without the related diagnostic kit) can also be 
commercialized independently or with partner(s) in order to generate product 
sales. 

[000108] According to one aspect of the invention herein, the technology 
platform used for one or more of the scanning steps 120, 201, 257, 263, 265 
and/or the correlation steps 150, 152, 260, 263, 265 and 269 are made available 
at a low or discounted price to the organization conducting the research 
discussed herein, preferably exclusively for at least a period of time in a 
specified field In return the technology provider receives from the business 
organization discussed herein one or more of equity, royalties on discoveries, 
licenses to the content generated in one or more of steps 120, 150, 152, 260, 
263, 265 and 269 in a selected field and/or improvements to the technology 
platform, such as improvements made to chip technology. In an alternative 
embodiment, the organization conducting the research discussed herein is 
formed as a "tracking stock" of the technology provider. In preferred 
embodiments, the technology provider does not maintain control (as defined by 
the relevant accounting standard) of the organization performing the business 
methods elsewhere discussed herein. In this aspect of the invention, the early 
financial losses of the research organization need not be consolidated with the 
technology provider. 

[000109] It is to be understood that the above description is intended to be 
illustrative and not restrictive. The scope of the invention should, therefore, be 
determined not with reference to the above description, but instead with 
reference to the appended claims along with the full scope of equivalents 
thereto. 
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WHAT IS CLAIMED IS: 

1 . A business method comprising: 

a. scanning the human genome for single nucleotide 
polymorphisms across more than 10,000,000 bases including 
both genie and non-genic regions; 

b. grouping said single nucleotide polymorphisms into 
haplotype blocks; 

c. using said haplotype blocks in association studies with a 
phenotypic state; 

d. using associations from step c in a discovery process; and 

e. collaboratively or independently, marketing products from 
said discovery process. 

2. A business method comprising: 

a scanning the human genome for common single nucleotide 
polymorphisms across more than 10,000,000 bases; 

b. grouping said common single nucleotide polymorphisms into 
haplotype blocks; and 

c. collaboratively with a partner, using said haplotype blocks in 
association studies with a phenotypic state. 

3. The methods as recited in claims 1 or 2 wherein said scanning 
step does not scan repeat genetic regions. 

4. The methods as recited in claim 2 wherein said scanning step 
scans genie and non-genic regions of sequences in a database. 

5. The methods as recited in claims 1 or 2 wherein said scanning 
step scans more than 50,000,000 bases. 

6. The methods as recited in claims 1 or 2 wherein said scanning 
step scans more than 100,000,000 bases. 

7. The methods as recited in claims 1 or 2 wherein said scanning 
step scans more than 500,000,000 bases. 
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8. The methods as recited in claims 1 or 2 wherein said sc annin g 
step scans more than 1,000,000,000 bases. 

9. The methods as recited in claims 1 or 2 wherein only single 
nucleotide polymorphisms that occur in more than 2% of a 

5 sample population studied are used in the grouping step. 

10. The methods as recited in claims 1 or 2 wherein only single 
nucleotide polymorphisms that occur in more than 4% of a 
sample population studied are used in the grouping step. 

1 1 . The methods as recited in claims 1 or 2 wherein only single 
10 nucleotide polymoiphisms that occur in more than 6% of a 

sample population studied are used in the grouping step. 

12. The methods as recited in claims 1 or 2 wherein only single 
nucleotide polymorphisms that occur in more than 8% of a 
sample population studied are used in the grouping step. 

15 13. The methods as recited in claims 1 or 2 wherein only single 

nucleotide polymorphisms that occur in more than 10% of a 
sample population studied are used in the grouping step. 

14. The method as recited in claims 1 or 2 wherein said phenotypic 
state is a disease state. 

20 15. The method as recited in claims 1 or 2 wherein said step of using 

said haplotype blocks in association studies comprises scanning 
for at least some of said single nucleotide polymorphisms in said 
haplotype blocks on a nucleic acid array. 

1 6. The method as recited in claims 1 or 2 wherein said step of using 
25 said haplotype blocks in association studies comprises scanning 

for at least some of said single nucleotide polymorphisms in said 
haplotype blocks using an assay to identify single nucleotide 
polymorphisms. 

17. The method as recited in claims 1 or 2 wherein said phenotypic 
30 state is a drug response state. 
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18. • The method as recited in claim 17 wherein said drug response 

state is a responder state. 

19. The method as recited in claim 17 wherein said drug response 
state is a toxicity state. 

5 20. The method as recited in claim 17 further comprising the step of 

marketing a drug with a diagnostic test that stratifies patients 
taking said drug. 

2 1 . The method as recited in claims 1 or 2 wherein said scanning 
step scans genomic DNA or derivatives thereof for single 

10 nucleotide polymorphisms from more than 10 sources. 

22. The method as recited in claims 1 or 2 wherein said sc anning 
step scans genomic DNA or derivatives thereof for single 
nucleotide polymorphisms from more than 20 sources. 

23. The method as recited in claims 1 or 2 wherein said scanning 
15 step scans genomic DNA or derivatives thereof for single 

nucleotide polymorphisms from more than 30 sources. 

24. The method as recited in claims 1 or 2 wherein said scanning 
step scans genomic DNA or derivatives thereof for single 
nucleotide polymorphisms from more than 50 sources. 

20 25. The method as recited in claims 1 or 2 wherein genie and non- 

genic regions are scanned in a second organism and used in 
comparative studies between said second organism and said 
human. 

26. The method as recited in claims 1 or 2 wherein only common 
25 polymorphisms are used in said grouping step. 

27. The method as recited in claim 1 wherein said association studies 
are performed with a partner. 

28. The method as recited in claims 1 or 2 wherein more than 2 
chromosomes are analyzed in said scanning step. 

30 29. The method as recited in claims 1 or 2 wherein more than 5 
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chromosomes are analyzed in said scanning step. 

30. The method as recited in claims 2 or 27 wherein said partner 
provides funding for said scanning and grouping steps. 

3 1 . The method as recited in claims 2 or 27 wherein said partner 
5 provides funding for said association studies. 

32. The method as recited in claims 1 or 2 wherein said association 
studies evaluate more than 500 disease and control samples. 

33. The method as recited in claims 1 or 2 where said association 
studies evaluate more than 1000 disease and control samples. 

10 34. The method as recited in claims 2 or 27 wherein said partner 

provides samples used in said association studies. 

35. The method as recited in claims 1 or 2 further comprising the 
step of identifying a pharmaceutical compound to address said 
phenotypic state. 

15 36. The method as recited in claim 35 further comprising the step of 

collecting royalties based on sales of said pharmaceutical 
compound 

37. The method as recited in claims 1 or 2 further comprising the 
step of segregating a drug patient population with results from 

20 said association studies. . 

38. The method as recited in claims 1 or 2 wherein a technology 
provider provides discounted technology for at least one of said 
scanning step or said association studies and receives equity in 
return for said discounted technology. 

25 39. The method as recited in claim 3 8 wherein said technology 

provider receives content from at least one of said scanning steps 
or association studies for use on said technology. 

40. The method as recited in claim 35 wherein said pharmaceutical 
compound is an antisense molecule. 
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4 1 . The method as recited in claim 3 5 wherein said pharmaceutical 
compound is a small organic molecule. 

42. The method as recited in claim 35 wherein said pharmaceutical 
compound is a protein compound 

5 43 . The method as recited in claim 3 5 wherein said pharmaceutical 

compound is an antibody. 

44. The method as recited in claims 1 or 34 wherein said association 
study is run without a prior hypothesis of a genetic association of 
said phenotypic state. 

10 45. The method as recited in claims 1 or 2 wherein said association 

studies are run with only representative single nucleotide 
polymorphisms within said haplotypes. 

46. The method as recited in claims 1 or 2 wherein said association 
studies use less than 300,000 nucleotide polymorphisms but 

15 provide potential associations across all chromosomes of an 

organism. 

47. The method as recited in claims 1 or 2 further comprising the 
step of entering into commercial transactions wherein target 
genomic regions are evaluated with said haplotype blocks. 

20 48. The method as recited in claims 1 or 2 wherein said association 

studies provide a plurality of potential genetic associations with 
said phenotypic state. 

49. The method as recited in claim 48 further comprising the step of 
conducting drug discovery processes on at least a part of said 

25 potential genetic associations and eliminating some of said 

genetic associations from further discovery processes. 

50. The method as recited in claims 1 or 2 further comprising the 
step of licensing a database of said haplotypes. . 

5 1 . The method as recited in claims 1 or 2 comprising the step of 
30 using said haplotypes in association contracts. 
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52. The method as recited in claim 51 comprising the step of using 
revenue from said association contracts in internal 
pharmaceutical development efforts. 

53. The method as recited in claims 1 or 2 further comprising the 

5 step of obtaining milestone payments on drugs developed based 

on said association studies. 

54. The method as recited in claims 1 or 2 wherein said scanning 
step uses a technology platform from a technology provider, said 
technology provider providing said technology platform at a 

1 0 discounted price in return for equity. 

55. The method as recited in claims 1 or 2 wherein said scanning 
step uses a technology platform from a technology provider, said 
technology provider providing said technology platform at a 
discounted price in return for data from at least one of said 

1 5 scanning step or said grouping step. 

56. The method as recited in claims 1 or 2 wherein said scanning 
step uses a technology platform from a technology provider, said 
technology provider providing said technology platform at a 
discounted price in return for rights to improvements to said 

20 technology platform. 

57. The method as recited in claim 35 wherein a partner pays 
royalties on said compounds. 

58. The method as recited in claims 1 or 2 wherein said scanning 
step scans genomes from ethnically diverse individuals. 

25 59. The method as recited in claims 1 or 2 wherein said scanning 

step scans haploid genomes. 

60. The method as recited in claims 1 or 2 wherein said scanning 

step is repeated using results of said association study to identify 
further said single nucleotide polymorphisms in a region of 
30 interest. 
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61. A business method comprising: 

a. identifying genetic variations in a plurality of individuals; 

b. identifying at least some of said genetic variations that occur 
5 with at least some other genetic variations; and 

c. using some, but not all, of said variations that occur with at 
least some others of said genetic variations in correlation 
with a phenotypic state in collaborative agreements. 

62. A business method comprising: 

10 a. individually scanning a plurality of genomes to identify 

common variants; 

b. pooling samples of case and control DNAs to form pooled 
DNAs; 

c. identifying a relative allele frequency of said common 
1 5 variants in said pooled DNAs to identify variants more 

common in one of said case or control pooled DNAs; 

d. using said variants identified as more common in discovery 
process; and 

e. collaboratively or independently marketing products from 
20 said discovery process. 
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